Age matrix for queue dispatch order

ABSTRACT

An apparatus for queue allocation. An embodiment of the apparatus includes a dispatch order data structure, a bit vector, and a queue controller. The dispatch order data structure corresponds to a queue. The dispatch order data structure stores a plurality of dispatch indicators associated with a plurality of pairs of entries of the queue to indicate a write order of the entries in the queue. The bit vector stores a plurality of mask values corresponding to the dispatch indicators of the dispatch order data structure. The queue controller interfaces with the queue and the dispatch order data structure. The queue controller excludes at least some of the entries from a queue operation based on the mask values of the bit vector.

BACKGROUND

An instruction scheduling queue is used to store instructions prior toexecution. There are many different ways to manage the dispatch order,or age, of instructions in an instruction scheduling queue. A commonqueue implementation uses a first-in-first-out (FIFO) data structure. Inthis implementation, instruction dispatches arrive at the tail, or end,of the FIFO data structure. A look-up mechanism finds the firstinstruction ready for issue from the head, or start, of the FIFO datastructure.

In conventional out-of-order implementations, instructions are selectedfrom anywhere in the FIFO data structure. This creates “holes” in theFIFO data structure at the locations of the selected instructions. Tomaintain absolute ordering of instruction dispatches in the FIFO datastructure (e.g., for fairness), all of the remaining instructions afterthe selected instructions are shifted forward in the FIFO, and the datastructure is collapsed to form a contiguous chain of instructions.Shifting and collapsing the remaining queue entries in this mannerallows new entries to be added to the tail, or end, of the FIFO datastructure. However, with a robust out-of-order issue rate, severalinstructions are shifted and collapsed every cycle. Hence, maintaining acontiguous sequence of queue entries without “holes” consumes asignificant amount of power and processing resources.

SUMMARY

Embodiments of an apparatus are described. In one embodiment, theapparatus is an apparatus for queue allocation. An embodiment of theapparatus includes a dispatch order data structure, a bit vector, and aqueue controller. The dispatch order data structure corresponds to aqueue. The dispatch order data structure stores a plurality of dispatchindicators associated with a plurality of pairs of entries of the queueto indicate a write order of the entries in the queue. The bit vectorstores a plurality of mask values corresponding to the dispatchindicators of the dispatch order data structure. The queue controllerinterfaces with the queue and the dispatch order data structure. Thequeue controller excludes at least some of the entries from a queueoperation based on the mask values of the bit vector. Other embodimentsof the apparatus are also described.

Embodiments of a method are also described. In one embodiment, themethod is a method for managing a dispatch order of queue entries in aqueue. An embodiment of the method includes storing a plurality ofdispatch indicators corresponding to pairs of entries in a queue. Eachdispatch indicator is indicative of the dispatch order of thecorresponding pair of entries. The method also includes storing a bitvector comprising a plurality of mask values corresponding to thedispatch indicators of the dispatch order data structure. The methodalso includes performing a queue operation on a subset of the entries inthe queue. The subset excludes at least some of the entries of the queuebased on the mask values of the bit vector. Other embodiments of themethod are also described.

Embodiments of a computer readable storage medium are also described. Inone embodiment, the computer readable storage medium embodies a programof machine-readable instructions, executable by a digital processor, toperform operations to facilitate queue allocation. The operationsinclude operations to store a plurality of dispatch indicatorscorresponding to pairs of entries in a queue. Each dispatch indicator isindicative of the dispatch order of the corresponding pair of entries.The operations also include operations to store a bit vector comprisinga plurality of mask values corresponding to the dispatch indicators ofthe dispatch order data structure, and to perform a queue operation on asubset of the entries in the queue. The subset excludes at least some ofthe entries of the queue based on the mask values of the bit vector.Other embodiments of the computer readable storage medium are alsodescribed.

Other aspects and advantages of embodiments of the present inventionwill become apparent from the following detailed description, taken inconjunction with the accompanying drawings, illustrated by way ofexample of the principles of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts a schematic block diagram of one embodiment of aplurality of instruction scheduling queues with corresponding dispatchorder data structures.

FIG. 2 depicts a schematic diagram of one embodiment of a dispatch orderdata structure in a matrix configuration.

FIG. 3 depicts a schematic diagram of one embodiment of a sequence ofdata structure states of the dispatch order data structure shown in FIG.2.

FIG. 4 depicts a schematic diagram of another embodiment of a dispatchorder data structure with masked duplicate entries.

FIG. 5 depicts a schematic diagram of one embodiment of a sequence ofdata structure states of the dispatch order data structure shown in FIG.4.

FIG. 6 depicts a schematic diagram of another embodiment of a dispatchorder data structure in a partial matrix configuration.

FIG. 7 depicts a schematic diagram of one embodiment of a sequence ofdata structure states of the dispatch order data structure shown in FIG.6.

FIG. 8 depicts a schematic block diagram of one embodiment of aninstruction queue scheduler which uses a dispatch order data structure.

FIG. 9 depicts a schematic flow chart diagram of one embodiment of aqueue operation method for use with the instruction queue scheduler ofFIG. 8.

Throughout the description, similar reference numbers may be used toidentify similar elements.

DETAILED DESCRIPTION

FIG. 1 depicts a schematic block diagram of one embodiment of aplurality of instruction scheduling queues 102 with correspondingdispatch order data structures 104. In general, the instructionscheduling queues 102 store instructions, or some representativeindicators of the instructions, prior to execution. The instructionscheduling queues 102 are also referred to as issue queues. The storedinstructions are referred to as entries. It should be noted thatalthough the following description references a specific type of queue(i.e., an instruction scheduling queue), embodiments may be implementedfor other types of queues.

Instead of implementing shifting and collapsing operations tocontinually adjust the positions of the entries in each queue 102, thedispatch order data structure 104 is kept separately from the queue. Inone embodiment, each issue queue 102 is a fully-associative structure ina random access memory (RAM) device. The dispatch order data structures104 are separate control structures to maintain the relative dispatchorder, or age, of the entries in the corresponding issue queues 102. Anassociated instruction scheduler may be implemented as a RAM structureor, alternatively, as another type of structure.

In one embodiment, the dispatch order data structures 104 correspond tothe queues 102. Each dispatch order data structure 104 stores aplurality of dispatch indicators associated with a plurality of pairs ofentries of the corresponding queue 102. Each dispatch indicatorindicates a dispatch order of the entries in each pair.

In one embodiment, the dispatch order data structure 104 stores arepresentation of at least a partial matrix with intersecting rows andcolumns. Each row corresponds to one of the entries of the queue, andeach column corresponding to one of the entries of the queue. Hence, theintersections of the rows and columns correspond to the pairs of entriesin the queue. Since the dispatch order data structure 104 storesdispatch, or age, information, and may be configured as a matrix, thedispatch order data structure 104 is also referred to as an age matrix.

FIG. 2 depicts a schematic diagram of one embodiment of a dispatch orderdata structure 110 in a matrix configuration. The dispatch order datastructure 110 is associated with a specific issue queue 102. Thedispatch order of the entries in the queue 102 depends on the relativeage of each entry, or when the entry is written into the queue, comparedto the other entries in the queue 102. The dispatch order data structure110 provides a representation of the dispatch order for thecorresponding issue queue 102.

The illustrated dispatch order data structure 110 has four rows,designated as rows 0-3, corresponding to entries of the issue queue 102.Similarly, the dispatch order data structure has four columns,designated as columns 0-3, corresponding to the same entries of theissue queue 102. Other embodiments of the dispatch order data structure110 may include fewer or more rows and columns, depending on the numberof entries in the corresponding issues queue 102.

The intersections between the rows and columns correspond to differentpairs, or combinations, of entries in the issue queue 102. As describedabove, each entry of the dispatch order data structure 110 indicates arelative dispatch order, or age, of the corresponding pair of entries inthe queue 102. Since there is not a relative age difference between anentry in the queue 102 and itself (i.e., where the row and columncorrespond to the same entry in the queue 102), the diagonal of thedispatch order data structure 110 is not used or masked. Masked dispatchindicators are designated by an “X.”

For the remaining entries, arrows are shown to indicate the relativedispatch order for the corresponding pairs of entries in the queue 102.As a matter of convention in FIG. 2, the arrow points toward the olderentry, and away from the newer entry, in the corresponding pair ofentries. Hence, a left arrow indicates that the issue queue entrycorresponding to the row is older than the issue queue entrycorresponding to the column. In contrast, an upward arrow indicates thatthe issue queue entry corresponding to the column is older than theissue queue entry corresponding to the row.

For example, Entry_0 of the queue 102 is older than all of the otherentries, as shown in the bottom row and the rightmost column of thedispatch order data structure 110 (i.e., all of the arrows point towardthe older entry, Entry_0). In contrast, Entry_3 of the queue 102 isnewer than all of the other entries, as shown in the top row and theleftmost column of the dispatch order data structure 110 (all of thearrows point away from the newer entry, Entry_3). By looking at all ofthe dispatch indicators of the dispatch order data structure 110, it canbe seen that the dispatch order, from oldest to newest, of thecorresponding issue queue 102 is: Entry_0, Entry_1, Entry_2, Entry_3.

FIG. 3 depicts a schematic diagram of one embodiment of a sequence 112of data structure states of the dispatch order data structure 110 shownin FIG. 2. At time T0, the dispatch order data structure 110 has thesame dispatch order as shown in FIG. 2 and described above. At time T1,a new entry is written in Entry_0 of the issue queue 102. As a result,the dispatch indicators of the dispatch order data structure 110 areupdated to show that Entry_0 is the newest entry in the issue queue 102.Since Entry_0 was previously the oldest entry in the issue queue 102,all of the dispatch indicators for Entry_0 are updated.

At time T2, a new entry is written in Entry_2. As a result, the dispatchindicators of the dispatch order data structure 110 are updated to showthat Entry_2 is the newest entry in the issue queue 102. Since Entry_2was previously older than Entry_3 and Entry_0 at time T1, thecorresponding dispatch indicators for the pairs Entry_2/Entry_3 andEntry_2/Entry_0 are updated, or flipped. Since Entry_2 is already markedas newer than Entry_1 at time T1, the corresponding dispatch indicatorsfor the pair Entry_2/Entry_1 is not changed.

At time T3, a new entry is written in Entry_1. As a result, the dispatchindicators of the dispatch order data structure 110 are updated to showthat Entry_1 is the newest entry in the issue queue 102. Since Entry_1was previously the oldest entry in the issue queue 102 at time T2, allof the corresponding dispatch indicators for Entry_1 are updated, orflipped.

FIG. 4 depicts a schematic diagram of another embodiment of a dispatchorder data structure 120 with masked duplicate entries. Since thedispatch indicators above and below the masked diagonal entries areduplicates, either the top or bottom half of the dispatch order datastructure 120 may be masked. In the embodiment of FIG. 4, the topportion is masked. However, other embodiments may use the top portionand mask the bottom portion.

FIG. 5 depicts a schematic diagram of one embodiment of a sequence 122of data structure states of the dispatch order data structure 120 shownin FIG. 4. In particular, the sequence 122 shows how the dispatchindicators in the lower portion of the dispatch order data structure 120are changed each time an entry in the corresponding queue 102 ischanged. At time T1, a new entry is written in Entry_2, and the dispatchindicator for the pair Entry_2/Entry_3 is updated. At time T2, a newentry is written in Entry_0, and the dispatch indicators for all thepairs associated with Entry_0 are updated. At time T3, a new entry iswritten in Entry_3, and the dispatch indicators for the pairsEntry_3/Entry_0 and Entry_3/Entry_2 are updated. At time T4, a new entryis written in Entry_1, and the dispatch indicators for all of theentries associated with Entry_1 are updated.

FIG. 6 depicts a schematic diagram of another embodiment of a dispatchorder data structure 130 in a partial matrix configuration. Instead ofmasking the duplicate and unused dispatch indicators, the dispatch orderdata structure 130 only stores one dispatch indicator for each pair ofentries in the queue.

In this embodiment, the partial matrix configuration has fewer entries,and may be stored in less memory space, than the previously describedembodiments of the dispatch order data structures 110 and 120. Inparticular, for an issue queue 102 with a number of entries, N, thedispatch order data structure 130 may store the same number of dispatchindicators, n, as there are pairs of entries, according to thefollowing:

$n = {C_{2}^{N} = \frac{N!}{{2!}{\left( {N - 2} \right)!}}}$

where n designates the number of pairs of entries of the queue 102, andN designates a total number of entries in the queue 102. For example, ifthe queue 102 has 4 entries, then the number of pairs of entries is 6.Hence, the dispatch order data structure 130 stores six dispatchindicators, instead of 16 (i.e., a 4×4 matrix) dispatch indicators. Asanother example, an issue queue 102 with 16 entries has 120 uniquepairs, and the corresponding dispatch order data structure 130 stores120 dispatch indicators.

FIG. 7 depicts a schematic diagram of one embodiment of a sequence 132of data structure states of the dispatch order data structure 130 shownin FIG. 6. However, instead of showing the dispatch indicators asarrows, the illustrated dispatch order data structures 130 of FIG. 7 areshown as binary values. As a matter of convention, a binary “1”corresponds to a left arrow, and a binary “0” corresponds to an upwardarrow. However, other embodiments may be implemented using a differentconvention. Other than using binary values for a limited number ofdispatch indicators, the sequence 132 of queue operations for timesT0-T4 are the same as described above for FIG. 5.

FIG. 8 depicts a schematic block diagram of one embodiment of aninstruction queue scheduler 140 which uses a dispatch order datastructure 104 such as one of the dispatch order data structures 110,120, or 130. In one embodiment, the scheduler 140 is implemented in aprocessor (not shown). The processor may be implemented in a reducedinstruction set computer (RISC) design. Additionally, the processor mayimplement a design based on the MIPS instruction set architecture (ISA).However, alternative embodiments of the processor may implement otherinstruction set architectures. It should also be noted that otherembodiments of the scheduler 140 may include fewer or more componentsthan are shown in FIG. 8.

In conjunction with the scheduler 140, the processor also may includeexecution units (not shown) such as an arithmetic logic unit (ALU), afloating point unit (FPU), a load/store unit (LSU), and a memorymanagement unit (MMU). In one embodiment, each of these execution unitsis coupled to the scheduler 140, which schedules instructions forexecution by one of the execution units. Once an instruction isscheduled for execution, the instruction may be sent to thecorresponding execution unit where it is stored in an instruction queue102.

The illustrated scheduler 140 includes a queue 102, a mapper 142, and aqueue controller 144. The mapper 142 is configured to issue one or morequeue operations to insert new entries in the queue 102. In oneembodiment, the mapper 142 dispatches up to two instructions per cycleto each issue queue 102. The queue controller 144 also interfaces withthe queue 102 to update a dispatch order data structure 104 in responseto a queue operation to insert a new entry in the queue 102.

In order to receive two instructions per cycle, each issue queue 102 hastwo write ports, which are designated as Port_0 and Port_1.Alternatively, the mapper 142 may dispatch a single instruction on oneof the write ports. In other embodiments, the issue queue 102 may havemore than two write ports. If multiple instructions are dispatched atthe same time to multiple write ports, then the write ports may have adesignated order to indicate the relative dispatch order of theinstructions which are issued together. For example, an instructionissued on Port_0 may be designated as older than an instruction issuedin the same cycle on Port_1. In one embodiment, write addresses aregenerated internally in each issue queue 102.

The queue controller 144 keeps track of the dispatch order of theentries in the issue queue 102 to determine which entries can beoverwritten (or evicted). In order to track the dispatch order of theentries in the queue 102, the queue controller 144 includes dispatchlogic 146 with least recently used (LRU) logic 148. The queue controller144 also includes a bit mask vector 150 and an age matrix flop bank 152.In one embodiment, the flop bank 152 includes a plurality of flip-flops.Each flip-flop stores a bit value indicative of the dispatch order ofthe entries of a corresponding pair of entries. In other words, eachflip-flop corresponds to a dispatch indicator, and the flop bank 152implements the dispatch order data structure 104. The bit value of eachflip-flop is a binary bit value. In one embodiment, a logical high valueof the binary bit value indicates one dispatch order of the pair ofentries (e.g., the corresponding row is older than the correspondingcolumn), and a logical low value of the binary bit value to indicate areverse dispatch order of the pair of entries (e.g., the correspondingcolumn is older than the corresponding row). When a dispatch indicatoris updated in response to a new instruction written to the queue 102,the dispatch logic 146 is configured to potentially flip the binary bitvalue for the corresponding dispatch indicators. As described above, thenumber of flip-flops in the flop bank 152 may be determined by thenumber of pairs (e.g., combinations) of entries in the queue 102.

In order to determine which entries may be overwritten in the queue 102,the dispatch logic 146 includes least recently used (LRU) logic 148 toimplement a LRU replacement strategy. In one embodiment, the LRUreplacement strategy is based, at least in part, on the dispatchindicators of the corresponding dispatch order data structure 104implemented by the flop bank 152. As examples, the LRU logic 148 mayimplement a true LRU replacement strategy or a pseudo LRU replacementstrategy. In a true LRU replacement strategy, the LRU entries in thequeue 102 are replaced. The LRU entries are designated by LRUreplacement addresses. However, generating the LRU replacementaddresses, which is a serial operation, can be logically complex. Apseudo LRU replacement strategy approximates the true LRU replacementstrategy using a less complicated implementation.

When the mapper dispatches a new entry to the queue 102 as a part of aqueue operation, the queue 102 interfaces with the queue controller 144to determine which existing entry to discard to make room for the newlydispatched entry. In some embodiments, the dispatch logic 146 uses theage matrix flop bank 152 to determine which entry to replace based onthe absolute dispatch order of the entries in the queue 102. However, inother embodiments, it may be useful to identify an entry to discard fromamong a subset of the entries in the queue 102.

As one example, some entries in the queue 102 may be associated with areplay operation, so it may be useful to maintain the correspondingentries in the queue 102, regardless of the dispatch order of theentries. Thus, the entry to be discarded may be selected from a subsetthat excludes the entries associated with the replay operation.

As another example, it may be useful to maintain certain entries in thequeue 102 in order to prevent a hazard event such as a structural, data,or control hazard. Thus, the entry to be discarded may be selected froma subset that excludes the entries that, if discarded, would potentiallycreate a hazard event.

As another example, it may be useful to preserve entries of the queue102 that are related to a particular thread of a multi-threadedprocessing system. Thus, the entry to be discarded may be selected froma subset that excludes entries related to the identified thread. In thisway, the preserved entries corresponding to the identified thread aregiven priority, because the entries associated with the thread are notdiscarded.

In order to identify a subset of the entries in the queue 102, the queuecontroller 144 may use one or more bit mask vectors 150. In oneembodiment, each bit mask vector 150 is used to mask out one or moredispatch indicators of a dispatch order data structure 104 such as theage matrix flop bank 152. In other words, each bit mask vector 150 (orbit vector) is configured to store a plurality of mask valuescorresponding to the dispatch indicators of the dispatch order datastructure 104. Thus, the queue controller 144 can exclude at least someof the entries of the queue 102 from a queue operation based on the maskvalues of the bit vector 150. For example, instead of selecting theabsolute oldest entry of the queue 102 to be discarded, the dispatchlogic 146 may select the oldest entry of the subset of entries that arenot masked by the bit mask vector 150. In an alternative embodiment, thebit mask vector 150 is used to identify entries that may be discarded ina dispatch operation, rather than entries to be maintained in the queue102 (i.e., excluded from potentially discarding) in a dispatchoperation.

FIG. 9 depicts a schematic flow chart diagram of one embodiment of aqueue operation method 160 for use with the instruction queue scheduler140 of FIG. 8. Although the tracking method 160 is described withreference to the instruction queue scheduler 140 of FIG. 8, otherembodiments may be implemented in conjunction with other schedulers.

In the illustrated queue operation method 160, the queue controller 144initializes 162 the dispatch order data structure 104. As describedabove, the queue controller 144 may initialize the dispatch order datastructure 104 with a plurality of dispatch indicators based on thedispatch order of the entries in the queue 102. In this way, thedispatch order data structure 104 maintains an absolute dispatch orderfor the queue 102 to indicate the order in which the entries are writteninto the queue 102. Although some embodiments are described as using aparticular type of dispatch order data structure 104 such as the agematrix, other embodiments may use other implementations of the dispatchorder data structure.

The illustrated queue operation method 160 continues as the queue 102receives 164 a command for a queue operation such as a dispatchoperation. As explained above, the queue controller 144 selects anexisting entry of the queue 102 to be discarded from all of the entriesin the queue 102 or from a subset of the entries in the queue 102. Inorder to identify a subset of the entries in the queue 102, the queuecontroller 144 determines 166 if there is a bit mask vector 150 to usewith the received queue operation. If there is a bit mask vector 150,then the dispatch logic 146 applies 168 the bit mask vector 150 to thedispatch order data structure 104 before executing 170 the queueoperation. In this situation, the candidate entries which may bediscarded from the queue 102 is limited to some subset of the entries inthe queue 102. Otherwise, if there is not an applicable bit mask vector150, then the dispatch logic 146 may directly execute 170 the queueoperation. In this situation, the candidate entries which may bediscarded from the queue 102 is not limited to a subset of the entriesin the queue 102. After executing 170 the queue operation, the dispatchlogic 146 updates 172 the dispatch order data structure 104, and thedepicted queue operation method 160 ends.

It should be noted that embodiments of the methods, operations,functions, and/or logic may be implemented in software, firmware,hardware, or some combination thereof. Additionally, some embodiments ofthe methods, operations, functions, and/or logic may be implementedusing a hardware or software representation of one or more algorithmsrelated to the operations described above. To the degree that anembodiment may be implemented in software, the methods, operations,functions, and/or logic are stored on a computer-readable medium andaccessible by a computer processor.

As one example, an embodiment may be implemented as a computer readablestorage medium embodying a program of machine-readable instructions,executable by a digital processor, to perform operations to facilitatequeue allocation. The operations may include operations to store aplurality of dispatch indicators corresponding to pairs of entries in aqueue. Each dispatch indicator is indicative of the dispatch order ofthe corresponding pair of entries. The operations also includeoperations to store a bit vector comprising a plurality of mask valuescorresponding to the dispatch indicators of the dispatch order datastructure, and to perform a queue operation on a subset of the entriesin the queue. The subset excludes at least some of the entries of thequeue based on the mask values of the bit vector. Other embodiments ofthe computer readable storage medium may facilitate fewer or moreoperations.

Embodiments of the invention also may involve a number of functions tobe performed by a computer processor such as a central processing unit(CPU), a graphics processing unit (GPU), or a microprocessor. Themicroprocessor may be a specialized or dedicated microprocessor that isconfigured to perform particular tasks by executing machine-readablesoftware code that defines the particular tasks. The microprocessor alsomay be configured to operate and communicate with other devices such asdirect memory access modules, memory storage devices, Internet relatedhardware, and other devices that relate to the transmission of data. Thesoftware code may be configured using software formats such as Java,C++, XML (Extensible Mark-up Language) and other languages that may beused to define functions that relate to operations of devices requiredto carry out the functional operations related described herein. Thecode may be written in different forms and styles, many of which areknown to those skilled in the art. Different code formats, codeconfigurations, styles and forms of software programs and other means ofconfiguring code to define the operations of a microprocessor may beimplemented.

Within the different types of computers, such as computer servers, thatutilize the invention, there exist different types of memory devices forstoring and retrieving information while performing some or all of thefunctions described herein. In some embodiments, the memory/storagedevice where data is stored may be a separate device that is external tothe processor, or may be configured in a monolithic device, where thememory or storage device is located on the same integrated circuit, suchas components connected on a single substrate. Cache memory devices areoften included in computers for use by the CPU or GPU as a convenientstorage location for information that is frequently stored andretrieved. Similarly, a persistent memory is also frequently used withsuch computers for maintaining information that is frequently retrievedby a central processing unit, but that is not often altered within thepersistent memory, unlike the cache memory. Main memory is also usuallyincluded for storing and retrieving larger amounts of information suchas data and software applications configured to perform certainfunctions when executed by the central processing unit. These memorydevices may be configured as random access memory (RAM), static randomaccess memory (SRAM), dynamic random access memory (DRAM), flash memory,and other memory storage devices that may be accessed by a centralprocessing unit to store and retrieve information. Embodiments may beimplemented with various memory and storage devices, as well as anycommonly used protocol for storing and retrieving information to andfrom these memory devices respectively.

Although the operations of the method(s) herein are shown and describedin a particular order, the order of the operations of each method may bealtered so that certain operations may be performed in an inverse orderor so that certain operations may be performed, at least in part,concurrently with other operations. In another embodiment, instructionsor sub-operations of distinct operations may be implemented in anintermittent and/or alternating manner.

Although specific embodiments of the invention have been described andillustrated, the invention is not to be limited to the specific forms orarrangements of parts so described and illustrated. The scope of theinvention is to be defined by the claims appended hereto and theirequivalents.

1. An apparatus for queue allocation, the apparatus comprising: adispatch order data structure corresponding to a queue, the dispatchorder data structure to store a plurality of dispatch indicatorsassociated with a plurality of pairs of entries of the queue to indicatea write order of the entries in the queue; a bit vector to store aplurality of mask values corresponding to the dispatch indicators of thedispatch order data structure; and a queue controller to interface withthe queue and the dispatch order data structure, the queue controller toexclude at least some of the entries from a queue operation based on themask values of the bit vector.
 2. The apparatus according to claim 1,wherein the queue operation comprises a dispatch operation to write anew entry in the queue.
 3. The apparatus according to claim 1, whereinthe mask values of the bit vector comprise a replay mask to mask adispatch indictor for an entry of the queue associated with a replayoperation.
 4. The apparatus according to claim 1, wherein the maskvalues of the bit vector comprise an atomic flush mask to mask adispatch indicator for an entry of the queue associated with an atomicflush operation.
 5. The apparatus according to claim 1, wherein the maskvalues of the bit vector comprise a hazard mask to mask a dispatchindicator for an entry of the queue associated with prevention of ahazard event.
 6. The apparatus according to claim 5, wherein the hazardevent comprises a structural hazard event.
 7. The apparatus according toclaim 5, wherein the hazard event comprises a data hazard event.
 8. Theapparatus according to claim 5, wherein the hazard event comprises acontrol hazard event.
 9. The apparatus according to claim 1, wherein themask values of the bit vector comprise a thread mask to mask a subset ofdispatch indicators for corresponding entries of the queue associatedwith a thread of a plurality of threads in a multi-threaded processingsystem.
 10. The apparatus according to claim 1, further comprising aflop bank with a plurality of flip-flops, each flip-flop to store a bitvalue indicative of the dispatch order of the entries of a correspondingpair of entries.
 11. The apparatus according to claim 10, the queuecontroller further comprising dispatch logic to interface with thedispatch order data structure, the dispatch logic to flip the bit valuefor at least one of the dispatch indicators in response to the queueoperation to write the new entry in the queue.
 12. The apparatusaccording to claim 11, further comprising a random access memory (RAM)device to store the queue and the dispatch order data structure, whereinthe queue comprises a fully associative RAM structure and the dispatchorder data structure comprises a control structure separate from thefully associative RAM structure.
 13. The apparatus according to claim 1,further comprising a mapper coupled to the queue, the mapper to dispatchthe queue operation to insert a new entry in the queue.
 14. Theapparatus according to claim 1, the queue controller further comprisingleast recently used (LRU) logic, the LRU logic to implement a queueentry replacement strategy for the queue based on the dispatch orderdata structure, wherein the queue entry replacement strategy comprises atrue LRU replacement strategy or a pseudo LRU replacement strategy. 15.A method for managing a dispatch order of entries in a queue, the methodcomprising: storing a plurality of dispatch indicators corresponding topairs of entries in a queue, each dispatch indicator indicative of thedispatch order of the corresponding pair of entries; storing a bitvector comprising a plurality of mask values corresponding to thedispatch indicators of the dispatch order data structure; and performinga queue operation on a subset of the entries in the queue, wherein thesubset excludes at least some of the entries of the queue based on themask values of the bit vector.
 16. The method according to claim 15,wherein performing the queue operation comprises dispatching a new entryinto the queue.
 17. The method according to claim 16, further comprisingmasking a replay instruction stored in an entry of the queue to avoiddispatching the new entry in the location of the replay instruction. 18.The method according to claim 16, further comprising masking aninstruction stored in an entry of the queue from an atomic flushoperation to flush a plurality of instructions from the queue.
 19. Themethod according to claim 16, further comprising masking an instructionstored in an entry of the queue to prevent a hazard event.
 20. Themethod according to claim 19, wherein the hazard event comprises astructural hazard, a data hazard, or a control hazard.
 21. The methodaccording to claim 16, further comprising masking a plurality ofinstructions associated with a first thread to give priority toinstructions associated with a second thread.
 22. The method accordingto claim 15, further comprising storing the dispatch indicators in adispatch order data structure corresponding to a representation of atleast a partial matrix with intersecting rows and columns, each rowcorresponding to one of the entries of the queue and each columncorresponding to one of the entries of the queue, the intersections ofthe rows and columns corresponding to the pairs of entries in the queue.23. The method according to claim 15, further comprising storing thedispatch indicators in a plurality of flip-flops of a flop bank, eachflip-flop comprising a bit value indicative of the dispatch order of thecorresponding pair of entries.
 24. The method according to claim 15,further comprising implementing a least recently used (LRU) replacementstrategy for the queue based on at least some of the dispatchindicators.
 25. A computer readable storage medium embodying a programof machine-readable instructions, executable by a digital processor, toperform operations to facilitate queue allocation, the operationscomprising: store a plurality of dispatch indicators corresponding topairs of entries in a queue, each dispatch indicator indicative of thedispatch order of the corresponding pair of entries; store a bit vectorcomprising a plurality of mask values corresponding to the dispatchindicators of the dispatch order data structure; and perform a queueoperation on a subset of the entries in the queue, wherein the subsetexcludes at least some of the entries of the queue based on the maskvalues of the bit vector.
 26. The computer readable storage mediumaccording to claim 25, the operations further comprising an operation todispatch a new entry into the queue.
 27. The computer readable storagemedium according to claim 25, the operations further comprising anoperation to mask a replay instruction stored in an entry of the queueto avoid dispatching the new entry in the location of the replayinstruction.
 28. The computer readable storage medium according to claim25, the operations further comprising an operation to mask aninstruction stored in an entry of the queue from an atomic flushoperation to flush a plurality of instructions from the queue.
 29. Thecomputer readable storage medium according to claim 25, the operationsfurther comprising an operation to mask an instruction stored in anentry of the queue to prevent a hazard event.
 30. The computer readablestorage medium according to claim 29, the operations further comprisingan operation to mask a plurality of instructions associated with a firstthread to give priority to instructions associated with a second thread.